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Abstract 

It is frequently suggested that predictions made by game theory could be improved by 
considering computational restrictions when modeling agents. Under the supposition that 
players in a game may desire to balance maximization of payoff with minimization of strat- 
egy complexity, Rubinstein and co-authors studied forms of Nash equilibrium where strate- 
gies are maximally simplified in that no strategy can be further simplified without sacrificing 
payoff. Inspired by this line of work, we introduce a notion of equilibrium whereby strate- 
gies are also maximally simplified, but with respect to a simplification procedure that is 
more careful in that a player will not simplify if the simplification incents other players to 
deviate. We study such equilibria in two-player machine games in which players choose 
finite automata that succinctly represent strategies for repeated games; in this context, we 
present techniques for establishing that an outcome is at equilibrium and present results on 
the structure of equilibria. 

1 Introduction 

A frequently raised criticism of game theory is that its predictions clash with empirical observations as 
a consequence of being based on the assumption that agents possess and use unbounded computational 
power. This criticism has motivated the introduction and investigation of so-called models of bounded 
rationality ||28Tl , in which computational power considerations on agents are present. A number of different 
approaches to the study of bounded rationality have been suggested JU[7l[T7][24l[26l|27). ^ ne m °del that 
has received considerable attention is a machine game in which players choose finite-state automata that 
succinctly represent strategies for repeated games |[T8l I2T1 l24l l26l [H ; the model of finite-state automata 
can be taken as a formalization of players having bounded-size memory, and is well-studied in computer 
science. 

Rubinstein with co-authors Abreu and Piccione ll26l [Tll25l. in the context of the machine game, proposed 
and studied forms of Nash equilibrium under which strategies are maximally simplified in the sense that a 
player's strategy cannot be simplified without reducing his payoff. A supposition basic to this work is that 
players desire to minimize the complexity of their strategies, and hence in choosing strategies are concerned 
with balancing the maximization of payoff with the minimization of strategy complexity. Simplicity of 
strategies may be valued for a number of reasons; for instance, complex strategies may be more expensive to 
execute, more likely to break down, harder to learn, or costly to maintain ll22l . Following Rubinstein [26], 
it can be suggested that such maximally simplified equilibria resemble phenomena observed in real life: 
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institutions, organizations, and human abilities may degenerate or be reduced if they contain unnecessary or 
redundant components. 

The Rubinstein-Abreu-Piccione line of work, more specifically, studies Nash equilibria where the play- 
ers have preference relations that increase in the payoff and, when payoff is maintained, decrease in the 
complexity. While one can study the equilibria they define in any game where there is a complexity measure 
associated with the actions of each player, their work focuses on the mentioned machine game, and the 
complexity measure studied is the number of states of an automaton, which can be viewed as the memory 
size of a strategy. 

Inspired by this line of work, we introduce and study a new notion of equilibrium intended to capture 
maximally simplified strategies, but with respect to a more careful, conservative simplification procedure. 
The motivation for this new equilibrium notion stems from the observations that, in the Rubinstein model, a 
player simplifies without considering whether or not the simplification may incent other players to deviate, 
and that this liberal mode of simplification may spoil desirable outcomes that are Nash equilibria in the 
usual payoff sense. These observations can be illustrated by the following example. Consider the so-called 
grim trigger strategy in the infinitely repeated Prisoner's dilemma; this strategy, which can be implemented 
by a two-state automaton, is to cooperate until the other player is seen to have defected, and to then defect 
indefinitely. While this strategy paired with itself is a Nash equilibrium in the payoff sense-no player 
can unilaterally deviate and increase his payoff-it is not an equilibrium in the Rubinstein sense, since either 
player could maintain payoff but reduce complexity by switching to a strategy that always cooperates (which 
is a strategy that can be implemented by a one-state automaton). Notice, however, that such a switch would 
in turn incent the other player to change to a strategy that always defects against cooperation, thus spoiling 
the cooperation. It can in fact be verified that no pair of strategies that cooperate indefinitely form an 
equilibrium in the Rubinstein sense. 

Whereas in the Rubinstein model a player will simplify his strategy so long as he can maintain his 
payoff, in our model each player is forward-looking, and will only simplify his strategy if, in addition, 
no other player can profitably deviate post-simplification. That is, in considering simplifications, players 
are averse to potential payoff-motivated deviations by other players. Our notion of equilibrium, which we 
call lean equilibrium, is thus defined as an outcome of strategies at Nash equilibrium such that no player 
can both individually simplify his strategy and preserve the property of being at Nash equilibrium. The 
described grim trigger strategy paired with itself does constitute a lean equilibrium in the infinitely repeated 
prisoner's dilemma: the described strategy has two states, so any simplification must have one state; in order 
for the result to be a Nash equilibrium, the player with one state must always cooperate in order to be a best 
response to the other player; but, this is not a Nash equilibrium as the other player could then profitably 
deviate by always defecting. 

We present results on lean equilibria for two-player machine games where each player chooses a finite- 
state automaton representing a strategy in an infinitely repated game. We study three complexity measures; 
in addition to studying the "number of states" measure, we study two measures that we introduce. One is 
based on the number of states, but does not count threat states, and the other counts the number of transitions 
to non-threat states; the precise definitions appear later in the paper. Our primary technical results are the 
following. 

• We give techniques for establishing that outcomes are at lean equilibrium, and illustrate their use by a 
number of examples (Section [5]). 

• We present results on the structure of machines that are at equilibria, and, with respect to the number- 
of-transitions measure, give a precise description of the equilibria structure. This description in fact 
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shows that the machine structure can be inferred from a third-party observer that only views the 
induced sequence of action pairs (Section©. 

We believe that the developed theory evidences that the two introduced complexity measures are natural and 
mathematically robust. 

While the present work focuses on machine games with finite automata and was certainly inspired by 
previous work on such games, we want to emphasize that the notion of lean equilibrium is defined in a very 
general way (Section [3]) and can be applied to any game in which there is a notion of complexity associated 
with the players' actions. Indeed, our view is that one of the most promising avenues for future work is to 
analyze the lean equilibria in other types of games where such a notion of complexity is present or can be 
naturally defined; in particular, it could be of interest to study games arising directly from real-life situations 
and phenomena. We believe that the theory and results developed in this work vindicates the introduced 
equilibrium notion as a tangible and robust mathematical concept of which one can hope to present analysis 
in further games. 

Related work. The present work is a contribution to the study of bounded rationality; surveys and general 
references include |[27l, [171 131. The present article can in particular be taken as following a body of research 
where players in games are represented using models of computation; here, we briefly describe some of this 
research. 

The study of machine games where players select automata representing strategies in repeated games 
was initiated early in the study of bounded rationality. The already described work of Rubinstein, Abreu, 
and Piccione studied Nash equilibria where players' preferences take into account strategy complexity in 
addition to repeated game payoff. The paper of Rubinstein (26l studied an equilibrium concept in the spirit 
of subgame perfect equilibrium, obtaining structural results on such equilibria. Abreu and Rubinstein (H 
gave general structural results on equilibria and studied the payoff sets of certain 2-by-2 games; and, Pic- 
cione and Rubinstein (251 studied equilibria in repeated extensive games. Banks and Sundaram [4] studied a 
equilibrium notion similar to that considered in these papers, but focused on a "transitional" notion of strat- 
egy complexity that accounts for the amount of opponent monitoring required, and can differentiate among 
strategies with the same number of states. Kalai and Stanford |[T8ll gave a characterization of the number- 
of-states complexity measure for automata via an analog of the Myhill-Nerode theorem, and study subgame 
perfect equilibria in infinitely repeated games from the viewpoint of this measure. A number of works 
studied repeated games played by finite automata where bounds on the strategy complexity are imposed 
exogenously, including the articles of Neyman |[20l . Ben-Porath [6], Papadimitriou and Yannakakis [24], 
and Neyman fl2~Tl ; one focus of study is the set of payoffs sustainable in equilibrium. Gilboa lPT3l . Papadim- 
itriou [233, and Ben-Porath [5 ] studied the computational complexity of problems involving the computation 
of best response automata. Spiegler f29ll30ll presented equilibrium notions motivated by the idea that players 
may need to justify their strategies; he modeled players as finite-state automata. 

Another line of work studies games where players must employ computable strategies. Some of the 
initial work explored basic consequences of this modeling and invokes notions and ideas from computabil- 
ity theory, including the contributions of Binmore ISO, Canning (lOll . and Anderlini [2]. Megiddo and 
Wigderson (l9ll presented results on games played by Turing machines where the number of states is re- 
stricted. Howard [16], Tennenholtz (32, and Fortnow ifTTTl showed existence of equilibrium and folk the- 
orem style results by making use of self-reference ideas. Gossner (l4l studied repeated games played by 
polynomial-time Turing machines, invoking cryptographic assumptions to obtain results on the equilibria 
achievable under public communication. 
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More recent work includes the following. A framework for games where the actions have associated 
costs was proposed and studied by Ben-Sasson, Kalai, and Kalai Q. Halpern and Pass [15] presented and 
studied a machine game on Turing machines where utilities can be a function of machine complexities in 
addition to the action profile; among other issues, they study existence of equilibria and notions of protocol 
security. Fortnow and Santhanam lPT2l introduced and studied a machine game model where players' actions 
are probabilistic Turing machines that output actions in an underlying game; the payoff associated with a 
machine is discounted by the computation time used to produce actions. Their results include connections 
between the existence of Nash equilibria in the so-called factoring game and the computatoinal complexity 
of factoring, and general sufficient conditions for the existence of equilibria. 

2 Preliminaries 

In this section, we review some basic notions to be used. Our notation and terminology are standard, and for 
the most part follow typical conventions such as those described in the text by Osborne and Rubinstein ll22l . 
A strategic game is a tuple (N, (Ai), (^i)) consisting of a set N = {1, . . . , n} of players, a nonempty set 
Ai of actions for each player i G N, and a preference relation <i defined on A = Xj e j^Aj for each player 
i G N. For the most part, we will focus on two-player strategic games where the preference relations are 
specified by payoff functions m : Ai — > R. Recall that aNash equilibrium of a strategic game (N, (Ai), (<i 
)) is a profile a* G A of actions such that for all % G N and for all Oj G Ai, it holds that (a*_ { , af) <i 

For our purposes in this paper, a convex combination of vectors x\, . . . , xj, G M m is a vector of the form 
a±x± + • • • + ctdXd where the a n are rational coefficients with Xm=l a ™ = 1 and < a n < 1 for each n. 

We will make use of the following payoff notions. Let G = (N, (Ai), (ui)) be a strategic game. A 
feasible payoff profile of G is a convex combination of the vectors {u(a) \ a G A}. We define player i's 
minmax payoff, denoted Vi, to be the lowest payoff that the other players can force upon player i, that is, 
Vi = mm a _ i £A- i max aigj 4 t Uj(a_j, ai). A feasible payoff profile w G W 1 of G is called enforceable if 
Vi < Wi for all i G N, and is called strictly enforceable if Vi < Wi for all i G N. See Osborne and 
Rubinstein |[22l Section 8.5] for more information on these notions. 

3 Lean equilibrium 

We define a complexity order on a set of actions Ai to be a binary relation on Ai. We will consider games 
where each player i G N has a complexity order associated to his set of actions Ai, the intended 
interpretation is that 6j <j a, if player i considers the action bi to have the same complexity as or lower 
complexity than action a^. In such games, for bi G Ai, we will write bi < to denote that bi < ai holds 
and a-i < bi does not hold. Also, for a, b G A, we will write b < a to denote that for all i G N, it holds that 
£>i <k etj. We remark that in studying machine games, each complexity order that we consider arises from 
associating machines with elements in a total order; however, for broadest applicability, the results in this 
section (in particular, Proposition 13.21 ) are presented for more general settings. 

Definition 3.1 Let G = (N, (Ai), (^i)) be a strategic game with complexity orders A profile a* £ A 
of actions is a lean equilibrium of the game G if a* is a Nash equilibrium, but for all i G N and for all 
ai G Ai, if Oj < a*, then (a*_ { , ai) is not a Nash equilibrium. □ 

We now present a basic property of lean equilibrium, namely, the existence of lean equilibria under the 
assumption of the existence of Nash equilibria and a mild assumption on the complexity orders. 
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Proposition 3.2 Suppose that G = (N, (Ai), (^i)) is a strategic game with complexity orders (<.;) that are 
transitive and are well-founded in the sense that for all i £ N and for all Oj G Ai, there exists a bound on 
the length of a chain c\ <j • • • <j c& <j Then for every Nash equilibrium a* of G, there exists a lean 
equilibrium b G A such that b < a*. 

Proof. For an action a, from an action set Ai, let C(aj) denote the maximum length k of a chain c\ <j • • • <!, 
Cfc <j aj. For an action profile a G A define C(a) = ^ ig 7v C( a i)- We prove the result by induction on 

C(a*). 

For the base case, where C(a*) = 0, the profile a* is a lean equilibrium, as for all i G N and for all 
ai G Aj, it does not hold that <lja*. For the inductive case, suppose that C(a*) > 0. If the profile a* is not 
a lean equilibrium, then there exists i G N and there exists <2j G Aj such that (al i5 a^) is a Nash equilibrium 
and ai <ia*. We have that C(a*_ i , ai) < C(a*); by applying the induction hypothesis to (a*_ { , ai), we obtain 
a lean equilibrium b £ A such that b < (ot_j, (Xj). We have £tj) < a* and thus by transitivity of <, it 
holds that b < a*. □ 

One of the equilibrium notions studied by Abreu and Rubinstein 12 is Nash equilibrium with respect to 
the lexicographical ordering where payoff is prioritized over complexity: one profile a* is strictly preferred 
by a player to another profile b* if a* gives the player a strictly higher payoff than b*, or if a* gives the player 
the same payoff as b* but the player has strictly lower complexity in a*. We formalize this equilibrium notion 
and show that each such equilibrium is a lean equilibrium, as follows. 

Definition 3.3 Let G = (N, (Ai), (r^i)) be a strategic game with complexity orders (<3i). A profile a* £ A 
of actions is an Abreu-Rubinstein equilibrium of the game G if for alH G N and for all ai G Aj, (1) 

(a*-i, ai) <i (a* j, a*) and (2) (a!^, a*) <i (a*_^ ai) implies that a; <dj a* does not hold. □ 

Condition (1) implies that the profile a* is a Nash equilibrium, and condition (2) essentially says that 
there is no deviation cij for player i that yields the same utility and is simpler than a*. 

Proposition 3.4 Let G = (N, (Ai), (^h)) be a strategic game with complexity orders (<!«). Every Abreu- 
Rubinstein equilibrium is a lean equilibrium. 

Proof. Let a* be an Abreu-Rubinstein equilibrium. Clearly, the profile a* is a Nash equilibrium. Now 
consider a player i G N and an action aj G Ai. We want to show that if Oj < a*, then (a*_^ ai) is not a Nash 
equilibrium. By the definition of Abreu-Rubinstein equilibrium, if en < a*, then (a*^^) ^« (al i; a*), 
implying, as desired, that (a*_ { , ai) is not a Nash equilibrium. □ 

In light of this proposition, all examples of Abreu-Rubinstein equilibria are examples of lean equilibria. 
For instance, the Abreu-Rubinstein equilibria of certain two-player games are studied in [ 1 , Section 5] ; all 
examples given there are examples of lean equilibria. On the other hand, later in this paper, we will encounter 
examples of lean equilibria that are not Abreu-Rubinstein equilibria (for instance, in Examples 15.51 and [579l > . 

4 Machine games 

In this section, we introduce the machine games whose lean equilibria we will study, and some associated 
notions. These games involve choosing machines which implement strategies for repeated games, and have 
been previously studied, as discussed in the introduction. For more background on strategies as machines 
and for some simple examples, we refer the reader to Osborne and Rubinstein E2l Section 8.4 and Chapter 
9]. 
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Machines and machine games. Let G = (Si, S2, iti, u 2 ) be a two-person game in strategic form, where 
5i is a finite set of actions for player i, and ui : Si x S2 — >■ R is the payoff function for player i 

A machine for player z is a four tuple Mj = (Qi, qj, Aj, <5j) where Qj is a finite set of states, qj 6 Qj is 
the star? state or initial state, Aj : Q« — > Si is the output function, and 5i : Qi x Sj — >■ is the transition 
function; here, Sj denotes the action set of the other player. We emphasize that in this paper, we consider 
only machines that conform to this definition, namely, machines which have a finite number of states and 
which are deterministic. We use Mi to denote the set of all machines for player i, relative to a game G. We 
often use i and j to denote the two different players. 

A pair of machines (Mi, M 2 ) naturally induces a sequence (q l )t>i of state pairs and a sequence («*)*> 1 
of action pairs (G-outcomes), defined inductively as follows: 

Q 1 = (QlQl) 

8* = (Xi(q\),X 2 (q t 2 )) fart>l 
9* = (di(q t 1 - 1 ,s 2 - 1 ),S 2 (q t 2 - 1 ,s\- 1 )) for t > 1 

Each of the sequences (q l ), (s*) is ultimately periodic; we say that a sequence (&*)t>i is ultimately periodic 
if there exist numbers n,p > 1 such that for all m > n, it holds that 6 m = & m+ P. 

The payoff given to each machine is computed by the limit of means. For a sequence of action pairs (s*), 
we define rf(s t ) to be the average payoff to player i over the first T elements of the sequence, that is, we de- 
fine rf(s t ) = y Ylk=i u i( sk )- We define r«(s*) to be the corresponding limit of the average payoffs, that is, 
rj(s*) = lim^^oo rf(s t ); note that we will make use of this function only on ultimately periodic sequences 
(s*), and so the limit will always exist. For a pair of machines (Mi,M 2 ), we define rJ(Mi,M 2 ) = rf(s t ) 
and Ti(Mi, M 2 ) = r i (s t ), where here (s*) denotes the sequence of action pairs induced by the machines 
(Mi, M 2 ). Our focus will be on the machine game defined by G m = (Mi, M 2 , ri,r 2 ). 

Paths and cycles. With respect to a two-player game, a path in a machine Mj is a sequence pi % p 2 % 
■ ■ ■ ^ Pm+i where pi,... ,p m+ i € Qi are states, a 1 , . . . , a m G Sj are actions, and 5i(pk,a k ) = Pk+i for 
each k <G {1, . . . , m}. A cycle is a path where pi = p m +i, and is said to be a simple cycle if the states 
pi, . . . ,p rn are pairwise distinct. For a path P in Mi, the payoff to Mi, denoted by r\(P), is defined as 
h, TIk=i ui(Xi(Pk),a k ); and, the payoff to M 2 , denoted by r 2 (P), is defined as ^ YT=i u 2(Xi(Pk), a<k)- 
For a path P in M 2 , the payoffs ri(P) and r 2 (P) are defined similarly. It is known and straightforward to 
verify that, in the machine game G m , a payoff-maximizing response for player i to a machine Mj has payoff 
equal to the maximum of ri(C) over all cycles in the machine Mj reachable from the initial state, which is 
equal to the maximum of ri(C) over all such cycles in the machine Mj that are simple. 

Let us define a subcycle of a cycle pi % p 2 % ■ ■ ■ p m+ i to be a cycle that has either the form 

p n % p n+ i <1 ^ 1 ■ ■ -p n , or the form p n , °4 p n , +x ^ ■ ■ ■ ^ p m+ i = pi % p 2 % ■■■p n with n,n' 
satisfying 1 < n < n' < m. 

Complexity measures. We will study three complexity measures on machines. The first is the number of 
states \Qi\ of a machine Mj. We define the other two in the following way. We define a threat state of a 
machine Mj to be a state q such that Si(q, s) = q for all s G Sj and where max^.g^- Uj(\(q),aj) is the 
minmax payoff of the other player j, that is, where \i(q) forces the other player j to his minmax payoff. We 
define a normal state of a machine Mj to be a state that is not a threat state. We use Ri to denote the set of 
all normal states of a machine Mj, and we use \ \5i\\ to denote the number of normal transitions, by which 
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we mean transitions between normal states: 

||<5j|| = \{(qi,Sj) £ RiX Sj | 8i(qi,Sj) £ Ri}\. 

The two other complexity measures that we will study are the number of normal states, denoted by \R\, 
and the number of normal transitions, denoted by \\S\\. We will speak of lean equilibria with respect to, 
for instance, the measure \R\, by which we mean a lean equilibria where the complexity order for player i 
is given by Mj <j M[ if and only if \Ri\ < \R^\; here, Ri and R[ denote the sets of normal states of the 
machines Mj and M[, respectively. 

Relative to a pair of machines (Mi, M2), for each i E {1, 2}, we define the set of played states for 
player i, denoted by Pi, to be the set {qj \ t > 1}; here, (<?*) denotes the sequence of state pairs induced by 
(Mi, M2). We identify the following facts concerning played states, which we will sometimes use tacitly in 
the sequel. 

Proposition 4.1 Let (Mi, M2) be a pair of machines having a strictly enforceable payoff profile. For each 
player i € {1, 2}: 

• Every played state is a normal state, and thus it holds that \Pi\ < \Ri\. 

• The number of played states lower bounds the number of normal transitions: |Pj| < ||<5j||. 

The first claim follows from the assumption that the payoff profile is strictly enforceable, which implies that 
neither player ever plays a threat state. The second claim follows from the first and the observation that 
every played state has at least one transition to another played state. 

Equivalence relations. We now introduce a number of equivalence relations, each of which is defined 
over the set of positive integers, that will be used in our analysis and description of lean equilibria. Let 
(s*) and ((/*) be the sequences of action pairs and state pairs, respectively, induced by a pair of machines. 
We define the equivalence relation = s by: t = s t' if and only if for all n > 0, it holds that s t+n = s* +n . 
Similarly, we define the equivalence relation = q by: t = q t' if and only if for all n > 0, it holds that 
qt+n _ qt +« However, from the determinism of the machines, it is straightforward to verify that t = q t' 
if and only if q f = q l ; we will make use of this simpler characterization. As the sequence (s*) is equal 
to the sequence ((/*) mapped under the functions (Ai, A2), it is clear that if t = q t' , then t = s t'\ viewing 
these equivalence relations as sets of pairs, we can write = q C= s . We define the equivalence relations =j 
for i G {1, 2} by t =j t! if and only if q\ = qj . It is clear that t = q t' if and only if t =1 t' and t =2 t'. For a 
value t > 1, we will use [t] s to denote the = s -equivalence class of t, and similarly for the other equivalence 
relations. 

5 Establishing lean equilibrium: examples and theory 

In this section, we give techniques for establishing that outcomes of the machine game are at lean equi- 
librium, and illustrate their use by presenting a number of examples. We begin by defining some notions; 
the definitions and also the later results are relative to a game G = (Si, S2, ui, 1x2) and its corresponding 
machine game G m , although in what follows we will generally not mention the games G and G m explicitly. 

By a finite action sequence, we mean a finite-length sequence a = a 1 ... a k of action pairs (elements 
of S\ x S2). For each player i E {1,2}, we define the payoff of a finite action sequence a as rj(<j) = 



7 



(ui(a l ) + • • • + Ui(a k ))/k. We say that a finite action sequence a is strictly enforceable if its payoff 
profile (r\(o~), ^(cr)) is strictly enforceable. Each strictly enforceable finite action sequence a naturally 
induces a pair of machines (Mf,M![), where for each player i G {1,2}, the machine Mf is defined to 
have k + 1 states: k normal states, which we denote as {1, ... , k}, and a threat state. The output function 
Aj is defined with Aj(ra) = af for all n G {1, . . . , k}. The transition function Si has 6i(n, cr") = n + 1 
for n G {1, . . . , k — 1}, and Si(k,<Jj) = 1; all other transitions out of the normal states go to the threat 
state. We use (a) to denote the infinite sequence obtained by repeating a, that is, the sequence aa . . . = 

a\ . . . crfcCi . . . 07. The sequence of action pairs generated by (Mf , MJ) is clearly equal to (a). This 

property is a motivation for the definition of the machines: the transitions of the machines are designed so 
that the machines together will generate the sequence (a), but each machine will "punish" any deviation 
from this sequence by moving to and settling upon its threat state. We call a machine M a a-machine 
if for any best response M' to M, the pair (M, M') generates the sequence (a); clearly, for any strictly 
enforceable sequence a, the machines Mf and Mf are d-machines. 

We now present the first concepts and results that will allow us to give examples of lean equilibria. 
Relative to a sequence (s*) of action pairs we say that two time points t\, t<i > 1 are i-incompatible if there 
exists m > such that (1) for all n with < n < m, it holds that s] 1+n = sf +n , and (2) s- 1+m / s- 2+m . 
We say that two equivalence classes T\,Ti of = s are i-incompatible if there exist t\ € T\ and t?. € T2 such 
that t\ and £2 are i-incompatible; observe that, in fact, if equivalence classes T\ and T2 are i-incompatible, 
then for all t\ G T\, £2 £ I2 ^ n °lds that ti and £2 are i-incompatible. 

Proposition 5.1 Let (g*), (s ) fte itate sequence and action sequence induced by a pair of machines. If 
two = s -equivalence classes T\,Ti are i-incompatible, then for all t\ € T\, ti G T% it holds that g* 1 7^ g* 2 . 

This proposition follows immediately from the definitions of (q t ) and (s t ). 

We say that a finite action sequence a is i-irreducible if for any two distinct values ti, ti G {1, . . . , 
it holds that [ti] s and [^Js are i-incompatible with respect to (s*) = (cr). 

Theorem 5.2 Lef a be a strictly enforceable finite action sequence of length k, and let Mj be a a-machine. 
If a is i-irreducible, then then there are k = s -equivalence classes, [l] s , . . . , [k] s ; and, for any best response 
Ni to Mj with |Pj| < k, the pair (iVj, Mj) has =i equal to = s . 

Proof. By the definition of i-irreducibility and the definition of = s , we have that there are k = s -equivalence 
classes, [l] s , . . . , [k] s - Consider a best response N{ to Mj. Since Mj is a cr-machine, the pair (Ni, Mj) 
must produce an action sequence (s*) equal to (a). By Proposition 15. 1[ no i-state is played in two different 
^-equivalence classes. Since by hypothesis the number of states played by i is less than or equal to k, we 
have that [l] s , . . . , [k} s must be the equivalence classes of =j, and hence that =j is equal to = s . □ 

In this section, we will give a number of examples involving the Prisoner's Dilemma, which we take to 
be the following game: 





C 


D 


c 


(2, 2) 


(-1, 3) 


D 


(3, -1) 


(0,0) 



Note that, in this game, each of the two players has a minmax payoff of 0. For an integer N > and an action 
pair (s\, S2), we will use the notation N ■ (si, S2) to denote the finite action sequence containing N copies 
of the pair (si, S2). For instance, 2 • (C, D) represents the sequence (C, D), (C, D) and 2 ■ (D, C), 3 • (C, D) 
represents the sequence (D, C), (D, C), (C, D), (C, D), (C, D). 
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Example 5.3 Let Nc, Np > 1 be constants, and consider the finite action sequence a = Nc ■ (C, C), Nd ■ 
(D, D) in the Prisoner's Dilemma. Clearly, the sequence a is strictly enforceable, and has length k = 
Nc + No- We show that, with respect to the measures \R\ and \\S\\, the pair (MfjMJ) is an Abreu- 
Rubinstein equilibrium, and hence a lean equilibrium (by Proposition [33]), as follows. 

First, we show that the sequence a is both 1-irreducible and 2-irreducible. We begin by arguing 1- 
irreducibility. Let t\,t2 € {1, . . . , k} be two distinct values, and assume without loss of generality that 
t\ < *2- We show that t\ and ti are 1 -incompatible with respect to (a). If ^ c* 2 , then clearly t\ and 
t<i are 1-incompatible. In the case that cj 1 = cr* 2 , by the definition of a, there exists a minimum value 
m > 1 such that af / (o-)* 2+m ; observe that cr* 1 = (cr}* 1+m . For all n with < n < m, we have 
{a) 2 1+n = {c) t 2 +n = CT\ , and so we have that t\ and ^ are 1-incompatible. We thus have that a is 1- 
irreducible; by an argument that is identical up to swapping the players, we also have that a is 2-irreducible. 

We now argue that the pair (Mf , M%) is an Abreu-Rubinstein equilibrium with respect to \R\ and \ \8\ |. 
Observe that for these machines, we have \R\ \ = \R%\ = \ = 1 1#2 1 1 = k. We show by contradiction that 
there is no player 1 best response JVi to M% having [J?i| < k or ||<5i|| < k. Suppose that there is; then the 
payoff profile of (Ni,M![) must be (ri(cr), ^(er)), which is strictly enforceable, and by Proposition 14.11 
it holds that \Pi\ < k. By the 1-irreducibility of a and Theorem 15.21 it holds that =i is equal to = s and 
hence that |Pi| = k, contradicting that |Pi| < k. It can similarly be shown that there is no player 2 best 
response N2 to Mf having \Rq\ < k or [I^H < k. We have thus argued that the pair (Mi,M%) is an 
Abreu-Rubinstein equilibrium with respect to \R\ and \ \6\\. 

In the Prisoner's Dilemma, it can similarly be shown that for constants Ncd,Ndc > 1> the finite action 
sequence a = Nqd • (C, D) , Ndc -(D,C) is both 1-irreducible and 2-irreducible, and that when a is strictly 
enforceable, the pair (Mf , M% ) is an Abreu-Rubinstein equilibrium with respect to both and \ \5\\. More 
generally, let G = (S\, S2, U\, U2) be a game, let j3\ : {1, . . . , 6} — > 5i and /?2 : {1, . . . , 6} — > S2 be 
injective mappings, and let N\ , . . . , iVj, > 1 be constants, with 6 > 2. By arguments similar to those given 
above, the finite action sequence cr = N\ ■ (/3i(l), /?2 ( 1 ) ) , • • • , iV& • (/3i(6), ^2(6)) can be shown to be both 
1-irreducible and 2-irreducible, and when <r is strictly enforceable, the pair (Mf , M^) can be shown to be 
an Abreu-Rubinstein equilibrium with respect to \R\ and ||<5||. □ 

We now introduce another technique for establishing lean equilibria. Let a be a finite action sequence. 
Define a rotation of a = a 1 ... a k to be a length k sequence of the form a n a n+l . . . a k a X G 2 . . . o n ~ x for n 
with 1 < n < k. Let i E {1, 2} be one of the players and let B C Si. We say that a is (z, B)-rigid if for every 

rotation p of <r and every n with 1 < n < k, when jo|, G 5, it holds that («j(/9i) H h Uj(p n ))/n 7^ 

rj(<r). 

Theorem 5.4 Lef i G {1,2} a«<i i? C Si. Let a be a strictly enforceable finite action sequence of length k, 
let b be the number of elements a n of a with af £ B, and let Mj be a a-machine. If a is (i, B)-rigid, then 
for any machine iVj with \{q 6 Pj | \{q) € B}\ < b relative to (iVj, Mj), f/je (iVj, Mj) is not a Nash 
equilibrium. 

Proof. Since Mj is a er-machine, if (s*) is not equal to (a), then iVj is not a best response to Mj and 
(Ni,Mj) is not a Nash equilibrium, so we assume that (s*) = (a). 

Consider the state pair sequence (q 1+dk )d>i- By the finiteness of the state sets of the machines, some 
state pair must occur infinitely often in this sequence. Hence we can find time points t\ , £2 of the form 1+dk 
with t\ < t2 such that q tx = q t2 . The sequence q l1 , q tl+1 , . . . , q t2 determines a cycle 
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in N{. By our choice of t\, £2 and the assumption that (s*) = {a), we have rj(C) = rj(Ni, MJ) = rj(a). 

By the hypothesis on B and b, among the sequence of states g* 1 , q^ +l , ■ ■ ■ , ql 1+ ^ k ^ there must be 
two indices i' < t" with g*' = <jr|" and X(qj') S -B. The sequence of states qf , q\ +1 , . . . , g*" determine a 
subcycle C" of C which, by (i, £?)-rigidity, has rj(C) 7^ fj(C). We can view C as the concatenation of the 
subcycle C with another subcycle C". The value rj(C) is the convex combination of rj{C) and rj{C")\ 
since rj(C) / fj(C), we have rj(C") / fj(C). It must hold that one of the values rj{C'),r,j{C") is 
strictly greater than rj(C). This implies that (Aj, Mj) is not a Nash equilibrium, as player j could strictly 
improve his payoff by deviating. □ 

Example 5.5 Consider, in the Prisoner's Dilemma, a strictly enforceable payoff w that is the convex com- 
bination of u(C,C) and u{C,D). We can write w = (Ncc /N)u{C,C) + (N C d/N)u(C, D) where 
NcCiNqd are integers, and A = A^c + Nod- Since the payoff w is strictly enforceable, we have 
Ncc > 0. We can assume that Nqc, Nqd do not share any prime factors, for if they do share one, we can 
divide both of them by the factor while preserving the value of w. Note that this assumption implies that 
Ncc and A do not share any prime factors. 

Let a = N C c • (C, C),N C d • (C, D). We will show that (Mf , Mf) is a lean equilibrium with respect 
to and ||<5||. Note that the first player could preserve payoff but reduce complexity via a machine with 
one state that always outputs C, a machine that has \R\ = 1 and \ \5\\ = 2. Hence this pair is not an Abreu- 
Rubinstein equilibrium with respect to \R\ when N > 2, nor with respect to \\S\\ when N > 3. Along 
these lines, observe that no two = s -equivalence classes (for (a)) are 1-incompatible, since in the sequence 
a player 1 always plays the same action. 

We show that a is (1, {C}) -rigid. We show this by contradiction; let p be a rotation of a and let n be 
such that 1 < n < N and («2(pi) + • • • + U2(p n ))/n = w. This implies that for integers ncc, n-CD with 
< ncc < n > < ncc < n, and n^c + nc<£) = n, we have (ncc/ n )u(C, C) + (ncD/n)u(C, D) = w. 
Since u(C, C) 7^ w(C, D), we have ncc/n = Ncc/N, implying that Nccn = nccN. This implies that 
N > 1 divides Ncc n - Since A r and iVc-^ do not share any prime factors, this implies that N divides n, a 
contradiction to n < N. We have thus shown that a is (1, {C})-rigid. 

Consider any machine N\ with \R\\ < N or \\Si\\ < N. Such a machine must have |Pi| < N, and 
hence by Theorem l5.4l the pair (N±, M%) is not a Nash equilibrium. On the other hand, it is straightforward 
to verify that the sequence a is 2-irreducible. Thus, for any machine A^2 with |i?2| < iV or | ] <5a 1 1 < N, we 
have I-P2I < N and by Theorem 15.21 the machine is not a best response to Mf . We conclude that the 
pair (Mf , MJ) is a lean equilibrium with respect to \R\ and \ \5\\. □ 

Example 5.6 Consider, in the Prisoner's Dilemma, a finite action sequence of the form a = kcD- (C, D),koD 
(D, D),kDc{D,C), where kcD,kf)£),k£ic > 1 and a is strongly enforceable. We use k to denote the 
length kcD + koo + ^dc of c- We show that the pair (Mf , MJ) is a lean equilibrium with respect to \R\ 
and||,5||. 

Let N\ be a machine for player 1. Suppose that N\ is a best response to MJ. Then the pair [N\, MJ) 
produces the action sequence (s*) = (<r). We show that if iVi has < k or ||<5i|| < k, then the pair 
(A 7 !, MJ) is not a Nash equilibrium. 

Let denote the states of N\ that output C, and let Qr> denote the normal states of N\ that output 
D. It is straightforward to verify that for any two distinct ii, £2 € {1, • • • , kco}, the classes [ti] s , [t2] s are 
1-incompatible, and hence, we have \Qc\ > ^cd by Proposition [5Tj We next show that a is (1, {Z)})-rigid. 
Consider any rotation p of a and a value n with 1 < n < k and p\ = p™ +1 = -D; in one of the sequences 
p' = pi . . . /}„, p" = p n+ i ■ ■ ■ Pk, player 1 uses only the action D and hence one of the values ^(p'); r 2(p") 
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is strictly below 0. On the other hand, ^(cr) is strictly above and can be written as the convex combination 
of r2{p') and r2(p"), and so neither of ^(p'), ^(p") is equal to r^a), and we have that a is (1, {D})-rigid. 
Now suppose that N\ has \R\\ < k or ||c<i|| < k. It follows that |Pi| < A;; since |Qc| > fccr>, this implies 
that \Qd\ < fcflo + &dc- By Theorem 15.41 we have that (iVi, MJ) is not a Nash equilibrium. 

In a similar way, it can be shown that for any best response N2 to Mf , if A^2 has |i?2 1 < A; or 1 1 <5s 1 1 < k, 
then the pair (Mf , ./V2) is not a Nash equilibrium. We thus have that (Mf , M|") is a lean equilibrium with 
respect to and \ \5\\. □ 

So far, our discussion has focused on the complexity measures \R\ and \\8\\. We now turn our attention 
to the complexity measure \Q\. 

Example 5.7 As in the previous example, let cr be a finite action sequence of the form a = kcD'(C, D),koD 
(D, D), koc{D, C) for the Prisoner's Dilemma, where kcDi kuD, hoc > 1 an d o is strongly enforceable; 
let k denote the length kcD + koo + hoc of cr. We give a pair of machines (Mi, M2) where each machine 
has k states that is a lean equilibrium with respect to \Q\. 

We define the machines Ml, M2 as follows. Each machine has state set Qi = Q2 = {1, . . . , k}, initial 
states q\ = q\ = 1, and has output function defined by A«(n) = af for all n G Qi. For the states n e Qi 
where cr™ = D, we define 5i(n, C) = 5i(n, D) = n + 1, where fc + 1 is understood to represent the state 1. 
For the states n G Qi where cr™ = C, we define #i(n, C) = n + 1 and 6i(n, D) = q* where q* is the first 
state where cr™ = C, that is, g* = /ccd + koD + 1 an d c/| = 1. The states q* can be thought of as "internal 
threat" states. This construction is similar to that of H] Page 1276, Case B]. 

We now observe that in each machine Mj, the simple cycle Cj that maximizes payoff to the other player 

j is the cycle naturally corresponding to cr, that is, 1 2 -4- ■ ■ ■ k -4 1; this cycle has rj(Cj) = rj(a). This 
is clearly the unique payoff-maximizing simple cycle of length fc. Also, in all shorter simple cycles, player 
i only defects, yielding player j a payoff strictly less than 0. 

Let Ni be a best response to Mj. By the observation in the previous paragraph, the sequence (c/j) must, 
after some finite amount of time, be equal to the sequence 1, . . . , k repeated infinitely. It is hence possible 
to modify the machine Ni, by changing its initial state to a state that is played against the state qj = 1 in the 
mentioned infinite repetition, to obtain a machine N- that, along with Mj, generates the sequence (cr), and 
has the same number of states as Ni. Now, if N- and Ni have strictly fewer than k states, then against Mj 
they have strictly fewer than k played states, and then by arguing as in Example 15.61 the pair (N-, Mj) is not 
a Nash equilibrium, from which it follows that the pair (iVj, Mj) is not a Nash equilibrium. We conclude 
that (Mi, M2) is a lean equilibrium with respect to |Q|. □ 

We now establish a theorem that will help us to establish lean equilibrium results with respect to \Q\. 
Let us say that a is i-foolable if there exists a rotation p = p 1 ... p k of cr and an action s' G Sj such that for 
all n with 1 < n < k, it holds that rj(p n p n+1 . . . p k ~ l p') > rj(a), where p' is the pair with player i action 
equal to p\ and player j action equal to s'. 

Theorem 5.8 Let a be a strictly enforceable finite action sequence, and let Mj be a a-machine. If a is 
i-foolable (via p), and Ni is a machine such that in (JVj, Mj) it holds that Pi = Qi (that is, all states in Ni 
are played), then (Ni, Mj) is not a Nash equilibrium. 

Proof. If N{ is not a best response to Mj, we are done, so we assume that Ni is a best response to Mj, in 
which case we have (s t ) = {a). Let c/j be any state of Qi. We claim that there is a path P in Ni from c/j to a 
state q[ G Qi such that rj(P) > rj(a). This suffices, since it implies that Mj is not a best response to Ni. 
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We reason as follows. By hypothesis, the state is played and hence there exists t > 1 with qi = q\. 
Since (s l ) = (a), there exists n with 1 < n < k where p n p n+1 . . . p k = s t s t+1 . . . s *+( fc_n ). The desired 
path P starts at qi and has actions . . . p^~ 1 s', where s' G is the action from the definition of i-foolable; 
by that definition, we have rj(P) > rj(a). □ 

Example 5.9 We return to the class of sequences considered in our first example, Example 15.31 Let 
Nc,Nd > 1 be constants, and consider the finite action sequence a = Nq ■ (C,C),Nd ■ (D,D) in 
the Prisoner's Dilemma; the sequence a is strictly enforceable, and has length k = Nc + No- We show that 
the pair (Mf , M^) is a lean equilibrium with respect to \Q\. Note that this pair is not an Abreu-Rubinstein 
equilibrium with respect to \Q\, since each of the machines has a threat state that is never played, and hence 
each of the machines could be simplified without sacrificing payoff by removing this threat state. 

To show that the described pair is a lean equilibrium, we show that, for each player i, when Ni is a best 
response to Mf with k or fewer states, the pair (JVj, Mf ) is not a Nash equilibrium. By the argumentation 
in Example l5.3l any such best response iV, must have at least k played states. Hence in such a best response 
Ni, all states are played; by Theorem 15.81 it thus suffices to show that the sequence a is i-foolable. It is 
straightforward to verify that a is z-foolable via the rotation p = No ■ (D, D),Nc • (C, C) and the action 
D € Sj. □ 

Example 5.10 We reconsider the sequences treated in Example [53] Let a = Nqc • {C,C), Nqd • (C,D), 
where Nqc > and Nqc: Ncd do not share any prime factors. We show that the pair (Aff , M%) is a lean 
equilibrium with respect to \Q\. This pair is not an Abreu-Rubinstein equilibrium with respect to \Q\: the 
first machine could be simplified to a machine that only outputs C without giving up payoff, and the second 
machine could eliminate its threat state without giving up payoff. 

We show that the sequence a is both 1-foolable and 2-foolable. We have 1-foolability by the rotation 
{N C c ~ 1) • (C, C),N C d- (C, D), (C, C) and the action D G S 2 , and we have 2-foolability by the rotation 
N CD ■ (C, D), N cc ■ (C, C) and the action D £ Si. 

We can now argue that the pair (Mf , M%) is a lean equilibrium with respect to \Q\. The structure of the 
argument is similar to that of the previous example. Consider a player i and a best response iVj to MJ with 
k or fewer states. It is shown in Example l5.5l that if Ni has strictly fewer than k played states, then (JVj, MJ) 
is not a Nash equilibrium. In the case that Ni has exactly k played states, all of its states are played and then 
(Ni,Mj) is not a Nash equilibrium by Theorem l5.8l □ 

The results in the last three examples demonstrate different types of payoffs that are sustainable by lean 
equilibria with respect to \Q\ in the repeated Prisoner's Dilemma. In particular, Example 15.71 shows that 
any strictly enforceable payoff profile in the interior of the convex hull of the points u(C, D), u(D, D), and 
u(D, C), is a payoff attainable by such a lean equilibrium. These results can be contrasted strongly with 
the results of Abreu and Rubinstein [1 , Section 5] showing that the payoffs of Abreu-Rubinstein equilibria 
in this context are the strictly enforceable payoffs that are convex combinations of the diagonals, that is, 
convex combinations of u(C, C) and u(D, D) and convex combinations of u(C, D) and u{D, C). 

6 Structure of lean equilibria 

In this section, we present results describing the structure of lean equilibria in machine games G m = 
(Mi, M.2, r\, r 2 ) with respect to the complexity measures \R\ and \\5\\. Our first result demonstrates that 
the sequence (g*) begins with a sequence of state pairs where each state is used only once, followed by a 
state pair where each state is used infinitely often. 
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Lemma 6.1 fl-oo Lemma) Suppose that (Mi, M2) is a lean equilibrium of G m with respect to one of the 
complexity measures \R\, \\5\ \ having a strictly enforceable payoff profile. Let u > 1 be the minimum value 
such that one of the states qf, q% is used later in the sequence (q ), that is, such that there exists i G {1, 2} 
such that qf G {q\ \ t > u}. Then, for each i G {1,2}, the state qf appears infinitely often in the sequence 

{41 

Proof. We prove this by contradiction. Suppose one or both of the states qf, q% appears finitely often in the 
respective sequence (g*). Let Tj(g) denote the set {t \ q = q\}, that is, the points in time where player i 
plays state q. Observe that T\(qf) n ^(g^) = {u}, for if this intersection contains two distinct elements, 
there must be infinitely many points t such that q l = q u . 

We claim that the players can be labelled as i', i" in such a way that: (1) Tj/(g") is finite, and (2) there 
exists v G Ti"(q^„) such that v > max IV (g"). 

We establish this claim as follows. If both sets Ti(g"), T2{q%) are finite, set v = max(Ti(g") U T2(g2)) 
and let i" be the unique element in {1, 2} such that v G Tj»(g",). If one of the sets Ti(g"), T2(q 2 l ) is finite 
and the other is infinite, let i' be the player in {1, 2} such that Tj/(g") is finite; since Tj»(g^,) is infinite, it is 
possible to select a value v satisfying condition (2). 

For the sake of notation, we now assume that i' = 1 and i" = 2. We want to show that (Mi, M2) is not 
a lean equilibrium. If (Mi, M2) is not a Nash equilibrium, we are done, so we assume that it is. Starting 
from Mi, we define a new machine M{ as follows. We set <^(g" -1 , s^ -1 ) = g^, or, in the case that u = 1, 
we set the initial state of M{ to be g^. We then have that 




is a path in M[ . We modify M[ so that, other than the transitions in this path, there are no transitions to the 
states gi , . . . , g" -1 ; we reroute the transitions to these states to a threat state. We also eliminate the state g", 
rerouting the transitions to it to a threat state. The state sequence induced by the machine pair (Af{,M2) 
is g , g 2 , . . . , g u_1 , q v , q v+1 , q v+2 , ■ ■ ■ and hence the payoffs to each of the two players is the same as in 
(Mi,M 2 ). 

We show that (M{,M2) is a Nash equilibrium by arguing that player 2 cannot obtain a strictly better 
payoff. Let C be any cycle in M[ . None of the states q\ , . . . , qf can appear in C ; since all modified 
transitions involved these states, the cycle C is also a cycle of Mi. As M2 was a best response to Mi, it is a 
best response to M{ . 

We now need only argue that M[ is simpler than Mi. In (Mi, M2), the state g" is a played state of Mi, 
so by Proposition 14.11 we have \R[\ < \Ri\. Also, the state g" contributes at least 1 to the value ||#i||, a 
contribution not present in the calculation of 1 1<^| |, so | \S[\ \ < \ \Si\ \. □ 

Lemma 6.2 Suppose that (Mi,M2) is a lean equilibrium of G m with respect to one of the complexity 
measures \R\, \\5\ \ having a strictly enforceable payoff profile, and suppose that =j is contained in = s (for 
some i G {1, 2}). Then, the equivalence relations = s , =j are equal. 

Proof. We prove this by contradiction. Suppose there are two values t" < t' such that t" = s t' but t" t'. 
Without loss of generality, we assume that i = 1, and so we have q\ ^ q\. We want to show that (Mi, M2) 
is not a lean equilibrium; if (Mi, M2) is not a Nash equilibrium, we are done, so we assume that it is. 

Define M{ to be the machine equal to Mi, but where the state q\ is eliminated and all transitions to q\ 
from states in R\ \ {q\ } are changed to transitions to q\ . 
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The machine M{ is simpler than Mi with respect to the complexity measure \R\, as it has one fewer 
played state than Mi (see Proposition 14. II ). It is also simpler than M\ with respect to ||<5||: the number of 
normal transitions out of states in R[ is equal to that of Ri, but the state q\ (in Mi) has at least one normal 
transition, as it is played in (Mi, M2). 

We claim that (M{, M2) is a Nash equilibrium of G m , which suffices. Define f([t] s ) as {q™ € Q± \ u € 
Let (q 4 ), (s*) be the sequences induced by the machines (M[, M2). We prove by induction that for all 
t > 1, it holds that q\ G /([t]), q% = q\, and s* = s*. The base case is clear, so assume that the claim holds on 
t > 1. Then = g* for some i' = s i. As ,§2 = .4 = s%, we have = Si(q{, s|) = <5i (gf' , s|) = 9* +1 - 
This implies that G f([t' + l] s ) = f([t + l] s ). As Ai($) = \i(q() = Ai(gJ) = 4- we have 
= and hence s m = s t+1 . 

Thus, to show that (M{, M2) is a Nash equilibrium of G m , it suffices to show that M2 cannot deviate to 
obtain a strictly higher payoff. Suppose that p\ — \- P2 -4 • • ■ p m Pi is a M{ -cycle C giving M2 a payoff 
r > r-2 (Mi, M2) = r2(M{, M2). By the choice of the values t" , t', there is a path P in Mi from q\ to q\ 
whose M2-payoff is r2(Mi, M2). For each state-action pair (pj,a,j) in the cycle C with 5i(pj,aj) = q\ 
(and hence S[(pj,aj) = q\ ), replace Pj+i = q\ with the path P. In this way, we obtain a Mi -cycle 
whose payoff is also strictly greater than ^(MuMz) = r2(M{,M2), contradicting that (MuMz) is a 
Nash equilibrium of G m . □ 

With respect to a machine pair (Mi, M2), we let Pi denote the set of played states of Mj. 

Lemma 6.3 Suppose that (Mi,M2) has a strictly enforceable payoff profile. If (Mi, M2) is a lean equi- 
librium with respect to \R\, then \Ri\ = |i?2| = = |-P2|» an d if (Mi, M2) is a lean equilibrium with 
respect to ||<5||, then \ \5i\\ = \\5 2 \\ = \Pi\ = [J^|. 

Proof. We claim that, starting from a Nash equilibrium (Mi, M2) with a strictly enforceable payoff profile, 
player j has a best response Mj to Mj where (Mj, Mj) is a Nash equilibrium and such that \Rj\ < \Pi \ and 
< \Pi\- This implies, in the case of a lean equlibrium with respect to \R\, that |i? 2 | < |-Pl| < \Ri\ < 
I -P2 1 < I -^2 1 > and similarly, in the case of a lean equilibrium with respect to \\S\\, that \P^\ < 1 1 62 \ \ < | -Pi | < 
I l^i 1 1 ^ l-F^I- The claim is argued as follows. The payoffs ri (Mi, M 2 ), r 2 (Mi, M 2 ) are equal to the payoffs 
to the two players of a cycle C in machine Mj. The cycle C is not necessarily a simple cycle, but if it is 
not simple, it can be viewed as the concatenation of two shorter cycles. Each of the two shorter cycles must 
give the same payoff to player j (otherwise (Mi, M2) would not be a Nash equilibrium, as player j could 
profitably deviate). We choose the cycle out of the two shorter cycles that gives the higher payoff to player 
i. We then iterate this process until we obtain a simple cycle C. The simple cycle has rj(C) = Tj(C) and 
fi(C) > ri(C). It is possible to implement a player j machine Mj that repeatedly walks the cycle C in M« 
satisfying the stated inequalities: this is done by taking a machine that simply walks a shortest path from the 
initial state of Mj to a state in the cycle C, and then repeatedly walks the cycle C The pair (Mj, Mj) is a 
Nash equilibrium: the machine Mj obtains the same payoff as the machine Mj, and the machine Mj could 
only profitably deviate by playing a threat state; but since his payoff is greater than or equal to his payoff in 
(Mj, Mi), this is not beneficial as his payoff is strictly above his minmax payoff. □ 

We now present our main structure theorem. This theorem not only describes the structure of machines 
at lean equilibrium with respect to the measure \\5\\, but shows that their structure can be derived solely 
from the equivalence relation = s , and hence just from the action sequence (s*); this implies that a third- 
party observer that only views the resulting action sequence can infer the structure of the machines. In order 
to give the statement, we introduce the following notion. Define a rho-machine to be a machine M where 
each normal state q reachable from the initial state has exactly one outgoing transition to a normal state; 
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denote this successor state by s(q). We call the set of states occurring finitely often in the sequence q 1 , 
siq 1 ), s 2 (q 1 ), ... the tail of the machine, and the other states (those occurring infinitely often) the head of 
the machine. 

Theorem 6.4 Suppose that (Afi, -M2) is a lean equilibrium of G m with respect to the complexity mea- 
sure \\S\\ having a strictly enforceable payoff profile. Then, the equivalence relations = s , =„, =1, =2 are 
all equal, and for each i G {1,2}, the machine Mi is a rho-machine having exactly one state s ) 
that is played at all time points [t] s for each equivalence class [t] s of = s , whose structure is given by 
5 i (q i ([t} s ),s t j )=q i ([t + l} s )forallt>l. 

Proof. By Lemma [631 in each of the machines M\, M2, each played state has exactly one outgoing normal 
transition which is to another played state. Thus, each of the machines is a rho-machine. By Lemma I6TT1 
the machines have the same tail size and the same head size, and thus the equivalence relations =1 and =2 
are the same, from which it follows (by definition of = q ) that the equivalence relations =1, =2, and = q are 
all the same. Since = q is always contained in = s , we can invoke Lemma l62l to obtain that =\ and =2 are 
each equal to = s , and we have that all four of the equivalence relations are equal. 

For each i € {1, 2}, by the equivalence of =j and = s , the machine Mi has one played state for each 
equivalence class of = s . As already noted, each played state has exactly one outgoing normal transition 
which is to another played state, and so the machine must be a rho-machine with the described structure. □ 

7 Discussion 

We introduced and studied the notion of lean equilibrium, a particular form of Nash equilibrium where 
strategies cannot be further simplified according to a cautious simplification procedure: a player simplifies 
only if post-simplification, the strategy vector will be a Nash equilibrium. It is possible to consider similar 
equilibrium notions relative to even more cautious simplification procedures: for instance, a player might 
anticipate simplifications of other players, and only want to simplify if, in addition to preserving Nash 
equilibrium, he will not lose payoff if other players simplify following his simplification. A variant of 
this idea would have a player simplifying if he will not lose payoff in the case that other players change 
best response following his simplification. We leave the investigation of these equilibrium notions to future 
work. The broad research direction that we hope to have identified in the present work is that of investigating 
notions of equilibria where players prefer simple strategies, but where the desire for simplicity is not wired 
directly into the players' payoffs. 
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